1. Project Summary
    • 1.1 Sample Information
    • 1.2 Raw data generation
  2. Quality control and data transformation
  3. Statistical analysis
    • 3.1 Multivariate
      • tSNE
      • UMAP
    • 3.2 Linear Mixed Models
      • Islet-present ROIs
      • Islet-absent ROIs
  4. Visualizations of differentially expressed genes for contrasts of interest
    • 4.1 KEGG class barplots
      • Islet-present ROIs
      • Islet-absent ROIs
    • 4.2 Volcano Plots
      • Islet-present ROIs
      • Islet-absent ROIs
    • 4.3 Boxplots
      • Islet-present ROIs
      • Islet-absent ROIs
    • 4.4 Heatmaps
      • Islet-present ROIs
      • Islet-absent ROIs
    • 4.5 KEGG enrichment analysis
      • Islet-present ROIs
      • Islet-absent ROIs

1 Project Summary

GeoMX DSP whole-transcriptome atlas (WTA) sequencing of regions of interest (ROI) collected from four slides from HuBMAP P1.

1.1 Sample Information

83 samples were collected and sequenced from 4 slides: 4A (Head), 9A (Neck), 14A (Body), and 21A (Tail). Nineteen to 22 ROIs were identified per slide as islet-present or islet-absent. Segmentation within ROIs was performed using markers in sequence: PanCK+ and CD31+. INS+ marker failed for this experiment.

1.2 Raw data generation

Spatially targeted transcriptomic sequencing was performed on samples selected using The GeoMx Digital Spatial Profiler (DSP) with the whole transcriptome atlas (WTA) probe set. 2x150 sequencing was performed on the Illumina NexSeq. Raw sequencing reads were processed with the GeoMX NGS Pipeline.

2 Quality Control and data transformation

2.1 Per-segment quality control

Zero segments are removed based on per-segment QC metrics.

Sample QC Summary Table
Pass Warning
LowReads 83 0
LowTrimmed 83 0
LowStitched 83 0
LowAligned 83 0
LowSaturation 83 0
LowNegatives 83 0
HighNTC 83 0
LowArea 83 0
TOTAL FLAGS 83 0

2.2 Per-probe quality control

One probe that did not pass QC are excluded

##   Passed Global Local
## 1  18813      1     1

2.3 Filter segments with low signal by Limit of Quantification (LOQ)

The limit of quantification (LOQ) is determined for each segment and segments and/or genes with abnormally low signal are filtered out to focus on the true biological data of interest. Segments with exceptionally low signal were filtered out. These segments will have a small fraction of panel genes detected above the LOQ relative to the other segments in the study. Segments with less than 5% of the genes detected were removed. Generally, 5-10% detection is a reasonable segment filtering threshold.

Six segments are excluded that are under the LOQ.

Visualization of the distribution of segments with respect to their % genes detected:

## Features  Samples 
##    18677       77

2.4 Filter out lowly detected genes

Look at the percentage of target genes detected across segments and filter out genes detected in <10% of the samples.

## Features  Samples 
##     7739       77

2.5 Q3 Normalization

Check for sufficient separation between negative probe counts and Q3 counts to ensure we have stable measure of Q3 signal and normalize by Q3.

2.6 Download raw and normalized counts of QC filtered data

3 Statistical analysis

Multivariate and univariate statistical analysis were performed to identify genes with significantly differential expression between pancreas regions.

3.1 UMAP (multivariate analaysis)

Uniform Manifold Approximation and Projection (UMAP) dimension reduction used for visualisation for general non-linear dimension reduction.

3.2 tSNE (multivariate analaysis)

t-SNE (t-distributed Stochastic Neighbor Embedding) is a nonlinear dimensionality reduction technique suited for embedding high dimension data into lower dimensional data (2D or 3D) for data visualization.

3.2 Univariate analysis

A linear mixed effect model (LMM) was used (R package LMM) to identify genes with significant differential expression related to pancreas region (body, head, neck, and tail) within each ROI type (islet-present or islet-absent). The top 10% most variable (CV) genes were used. The different segmentation targets could not be controlled for because islet-present ROI type only had one segmentation type. The LMM formula for each data subset was: Gene ~ pancreas_region.

Contrast Number of significantly DE genes (adjusted p-value < 0.05) in islet present ROI
body - head 0
body - neck 0
body - tail 0
head - neck 0
head - tail 0
neck - tail 0
Contrast Number of significantly DE genes (adjusted p-value < 0.05) in islet absent ROI
body - head 0
body - neck 6
body - tail 3
head - neck 0
head - tail 0
neck - tail 0
Contrast Number of significantly DE genes (p-value < 0.05) in islet present ROI
body - head 10
body - neck 0
body - tail 0
head - neck 0
head - tail 0
neck - tail 0
Contrast Number of significantly DE genes (p-value < 0.05) in islet absent ROI
body - head 0
body - neck 105
body - tail 113
head - neck 3
head - tail 1
neck - tail 28

Venn Diagram

A Venn diagram can be used to view the overlap between lists of DE genes across body vs. other contrasts.

Table of DE genes between pancreas regions

(p-value < 0.05)

To view results for a contrast of interest, click the arrow at the top of the contrast column to sort by contrast or use the search bar to search for a contrast. To see what genes were significantly DE for more than one contrast, sort by gene and see how many contrasts were significant for each gene. The search bar can also be used to search for a gene of interest.

4 Visualizations of significantly DE genes

All plots below can be zoomed, selected, and downloaded individually (and/or as modified) using the toolbar on the top right of the figure (will appear when you hover your mouse). Hover over plot points to view underlying data.

4.1 Hallmark and KEGG pathway NES from GSEA

If a contrast is missing, there were no pathways were significantly enriched for that contrast)

4.2 Volcano Plots of genes’ fold-change and p-values for each contrast

Hover over plot points to view additional gene labels.

genes that pass the significance cutoff (but not fold-change) are orange, genes that pass the fold-change cutoff (but not significance) are black, and genes that pass both the fold-change and significance cutoffs are red. genes in gray do not pass the significance or fold-change cutoff. If a color is missing from the plot, that is because no genes fall in that category. Adjusted p-value cutoff is 0.05, Fold-Change cutoff is 1.5.

4.3 Boxplot of any of the top 20 significantly DE genes by pancreas region

4.4 Heatmaps of top 20 DE genes per contrast

Hover and select a subset of genes or samples of interest to export a zoomed-in subfigure